{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ " # Manipulación, Limpieza y Exploración de Datos\n", "\n", "Este cuaderno tiene como objetivo revisar diversas técnicas de manipulación y limpieza de datos. Posterior al procesamiento de datos, realizaremos un análisis exploratorio de los datos para obtener insights valiosos. Es importante tener en cuenta que usaremos el dataset `bank_marketing` del repositorio de Machine Learning de UCI. Este conjunto de datos nos proporcionará una gran oportunidad para aplicar y practicar las técnicas que aprenderemos.\n", "\n", "## Contenidos\n", "\n", "A lo largo de este cuaderno, abordaremos los siguientes temas:\n", "\n", "- Selección y filtrado de datos: Aprenderemos cómo seleccionar y filtrar datos de manera eficiente en Python.\n", "- Manejo de valores nulos: Trataremos con valores nulos y aprenderemos técnicas para manejarlos.\n", "- Transformación de columnas: Veremos cómo transformar y manipular columnas en un DataFrame.\n", "- Estadísticas descriptivas: Obtendremos estadísticas descriptivas de nuestros datos para entender mejor su distribución y tendencias.\n", "- Correlaciones: Exploraremos las correlaciones entre diferentes variables en nuestros datos.\n", "\n", "## Ejemplos prácticos\n", "\n", "Aplicaremos estas técnicas en ejemplos prácticos para mejorar la calidad de los datos y para identificar tendencias y patrones en datos de ventas y marketing.\n", "\n", "## Dataset\n", "\n", "Para este cuaderno, utilizaremos el dataset `bank_marketing` del repositorio de Machine Learning de UCI. Este conjunto de datos nos proporcionará una gran oportunidad para aplicar y practicar las técnicas que aprenderemos." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
agejobmaritaleducationdefaultbalancehousingloancontactday_of_monthmonthdurationcampaignpdayspreviouspoutcomeHasTermDeposit
058managementmarriedtertiaryno2143yesnoNaN5may2611-10NaNno
144techniciansinglesecondaryno29yesnoNaN5may1511-10NaNno
233entrepreneurmarriedsecondaryno2yesyesNaN5may761-10NaNno
347blue-collarmarriedNaNno1506yesnoNaN5may921-10NaNno
433NaNsingleNaNno1nonoNaN5may1981-10NaNno
\n", "
" ], "text/plain": [ " age job marital education default balance housing loan \\\n", "0 58 management married tertiary no 2143 yes no \n", "1 44 technician single secondary no 29 yes no \n", "2 33 entrepreneur married secondary no 2 yes yes \n", "3 47 blue-collar married NaN no 1506 yes no \n", "4 33 NaN single NaN no 1 no no \n", "\n", " contact day_of_month month duration campaign pdays previous poutcome \\\n", "0 NaN 5 may 261 1 -1 0 NaN \n", "1 NaN 5 may 151 1 -1 0 NaN \n", "2 NaN 5 may 76 1 -1 0 NaN \n", "3 NaN 5 may 92 1 -1 0 NaN \n", "4 NaN 5 may 198 1 -1 0 NaN \n", "\n", " HasTermDeposit \n", "0 no \n", "1 no \n", "2 no \n", "3 no \n", "4 no " ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 45211 entries, 0 to 45210\n", "Data columns (total 17 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 age 45211 non-null int64 \n", " 1 job 44923 non-null object\n", " 2 marital 45211 non-null object\n", " 3 education 43354 non-null object\n", " 4 default 45211 non-null object\n", " 5 balance 45211 non-null int64 \n", " 6 housing 45211 non-null object\n", " 7 loan 45211 non-null object\n", " 8 contact 32191 non-null object\n", " 9 day_of_month 45211 non-null int64 \n", " 10 month 45211 non-null object\n", " 11 duration 45211 non-null int64 \n", " 12 campaign 45211 non-null int64 \n", " 13 pdays 45211 non-null int64 \n", " 14 previous 45211 non-null int64 \n", " 15 poutcome 8252 non-null object\n", " 16 HasTermDeposit 45211 non-null object\n", "dtypes: int64(7), object(10)\n", "memory usage: 5.9+ MB\n" ] }, { "data": { "text/plain": [ "None" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "## Carga de datos\n", "\n", "import pandas as pd \n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "\n", "\n", "data=pd.read_csv('bank_marketing.csv')\n", "\n", "# Mostrar las primeras filas\n", "display(data.head())\n", "\n", "# Información general del DataFrame\n", "display(data.info())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Limpieza de datos\n", "\n", "La limpieza de datos es un paso crucial en el proceso de análisis de datos. Los datos limpios son esenciales para obtener resultados precisos y confiables. En esta sección, aprenderemos a limpiar y manipular datos utilizando Python y la biblioteca Pandas.\n", "\n", "```{note}\n", "Pandas es una biblioteca de Python que proporciona estructuras de datos y herramientas de análisis de datos. Es ampliamente utilizada en la comunidad de ciencia de datos y es una de las bibliotecas más populares para el análisis de datos en Python.\n", "\n", "Para el proceso de limpieza y manipulación de datos, utilizaremos las siguientes funciones y métodos de Pandas:\n", "\n", "- `isnull()`: Comprueba si hay valores nulos en un DataFrame.\n", "\n", "- `dropna()`: Elimina filas o columnas con valores nulos de un DataFrame.\n", "\n", "- `fillna()`: Rellena los valores nulos con un valor específico.\n", "\n", "- `astype()`: Convierte el tipo de datos de una columna a un tipo de datos específico.\n", "\n", "- `describe()`: Muestra estadísticas descriptivas de un DataFrame.\n", "\n", "- `corr()`: Calcula la correlación entre columnas de un DataFrame.\n", "\n", "- `plot()`: Crea gráficos a partir de los datos de un DataFrame.\n", "\n", "- `value_counts()`: Cuenta los valores únicos en una columna de un DataFrame.\n", "\n", "```" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "age 0\n", "job 288\n", "marital 0\n", "education 1857\n", "default 0\n", "balance 0\n", "housing 0\n", "loan 0\n", "contact 13020\n", "day_of_month 0\n", "month 0\n", "duration 0\n", "campaign 0\n", "pdays 0\n", "previous 0\n", "poutcome 36959\n", "HasTermDeposit 0\n", "dtype: int64" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Identificar valores faltantes\n", "display(data.isnull().sum())\n", "\n", "# Si hay valores faltantes, aplicar estrategias de imputación o eliminación" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "job\n", "blue-collar 9732\n", "management 9458\n", "technician 7597\n", "admin. 5171\n", "services 4154\n", "retired 2264\n", "self-employed 1579\n", "entrepreneur 1487\n", "unemployed 1303\n", "housemaid 1240\n", "student 938\n", "NaN 288\n", "Name: count, dtype: int64" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Variable job\n", "\n", "data['job'].value_counts(dropna=False)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "job\n", "blue-collar 9732\n", "management 9458\n", "technician 7597\n", "admin. 5171\n", "services 4154\n", "retired 2264\n", "self-employed 1579\n", "entrepreneur 1487\n", "unemployed 1303\n", "housemaid 1240\n", "student 938\n", "NaN 288\n", "Name: count, dtype: int64\n", "job\n", "blue-collar 10020\n", "management 9458\n", "technician 7597\n", "admin. 5171\n", "services 4154\n", "retired 2264\n", "self-employed 1579\n", "entrepreneur 1487\n", "unemployed 1303\n", "housemaid 1240\n", "student 938\n", "Name: count, dtype: int64\n", "job\n", "blue-collar 9732\n", "management 9458\n", "technician 7597\n", "admin. 5171\n", "services 4154\n", "retired 2264\n", "self-employed 1579\n", "entrepreneur 1487\n", "unemployed 1303\n", "housemaid 1240\n", "student 938\n", "Desconocido 288\n", "Name: count, dtype: int64\n", "job\n", "blue-collar 9732\n", "management 9458\n", "technician 7597\n", "admin. 5171\n", "services 4154\n", "retired 2264\n", "self-employed 1579\n", "entrepreneur 1487\n", "unemployed 1303\n", "housemaid 1240\n", "student 938\n", "Name: count, dtype: int64\n" ] } ], "source": [ "## Estrategias de imputación\n", "\n", "# Imputación con la moda\n", "\n", "data_1=data.copy()\n", "data_1['job'].fillna(data_1['job'].mode()[0], inplace=True)\n", "\n", "# Imputación con la categoría 'Desconocido'\n", "\n", "data_2=data.copy()\n", "data_2['job'].fillna('Desconocido', inplace=True)\n", "\n", "\n", "# Eliminación de las filas con valores faltantes\n", "\n", "data_3=data.copy()\n", "data_3.dropna(subset=['job'], inplace=True)\n", "\n", "# Comparación de las estrategias de imputación\n", "\n", "print(data['job'].value_counts(dropna=False))\n", "\n", "print(data_1['job'].value_counts(dropna=False))\n", "\n", "print(data_2['job'].value_counts(dropna=False))\n", "\n", "print(data_3['job'].value_counts(dropna=False))" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "## Tratamiento de variables nulas numéricas\n", "\n", "Ejemplo=pd.DataFrame({'A':[1,2,None,None,3,4,None,5,6,None,7,8,None,9,10,None],})" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
A
01.0
12.0
2NaN
3NaN
43.0
54.0
6NaN
75.0
86.0
9NaN
107.0
118.0
12NaN
139.0
1410.0
15NaN
\n", "
" ], "text/plain": [ " A\n", "0 1.0\n", "1 2.0\n", "2 NaN\n", "3 NaN\n", "4 3.0\n", "5 4.0\n", "6 NaN\n", "7 5.0\n", "8 6.0\n", "9 NaN\n", "10 7.0\n", "11 8.0\n", "12 NaN\n", "13 9.0\n", "14 10.0\n", "15 NaN" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Ejemplo" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
A
01.0
12.0
25.5
35.5
43.0
54.0
65.5
75.0
86.0
95.5
107.0
118.0
125.5
139.0
1410.0
155.5
\n", "
" ], "text/plain": [ " A\n", "0 1.0\n", "1 2.0\n", "2 5.5\n", "3 5.5\n", "4 3.0\n", "5 4.0\n", "6 5.5\n", "7 5.0\n", "8 6.0\n", "9 5.5\n", "10 7.0\n", "11 8.0\n", "12 5.5\n", "13 9.0\n", "14 10.0\n", "15 5.5" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Imputación con la media\n", "\n", "Ejemplo_1=Ejemplo.copy()\n", "Ejemplo_1['A'].fillna(Ejemplo_1['A'].mean(), inplace=True)\n", "Ejemplo_1\n", "\n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
A
01.0
12.0
25.5
35.5
43.0
54.0
65.5
75.0
86.0
95.5
107.0
118.0
125.5
139.0
1410.0
155.5
\n", "
" ], "text/plain": [ " A\n", "0 1.0\n", "1 2.0\n", "2 5.5\n", "3 5.5\n", "4 3.0\n", "5 4.0\n", "6 5.5\n", "7 5.0\n", "8 6.0\n", "9 5.5\n", "10 7.0\n", "11 8.0\n", "12 5.5\n", "13 9.0\n", "14 10.0\n", "15 5.5" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Imputación con la mediana\n", "\n", "Ejemplo_2=Ejemplo.copy()\n", "Ejemplo_2['A'].fillna(Ejemplo_2['A'].median(), inplace=True)\n", "Ejemplo_2" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
A
01.0
12.0
21.0
31.0
43.0
54.0
61.0
75.0
86.0
91.0
107.0
118.0
121.0
139.0
1410.0
151.0
\n", "
" ], "text/plain": [ " A\n", "0 1.0\n", "1 2.0\n", "2 1.0\n", "3 1.0\n", "4 3.0\n", "5 4.0\n", "6 1.0\n", "7 5.0\n", "8 6.0\n", "9 1.0\n", "10 7.0\n", "11 8.0\n", "12 1.0\n", "13 9.0\n", "14 10.0\n", "15 1.0" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Imputación con la moda\n", "\n", "Ejemplo_3=Ejemplo.copy()\n", "Ejemplo_3['A'].fillna(Ejemplo_3['A'].mode()[0], inplace=True)\n", "Ejemplo_3" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
A
01.0
12.0
2-999.0
3-999.0
43.0
54.0
6-999.0
75.0
86.0
9-999.0
107.0
118.0
12-999.0
139.0
1410.0
15-999.0
\n", "
" ], "text/plain": [ " A\n", "0 1.0\n", "1 2.0\n", "2 -999.0\n", "3 -999.0\n", "4 3.0\n", "5 4.0\n", "6 -999.0\n", "7 5.0\n", "8 6.0\n", "9 -999.0\n", "10 7.0\n", "11 8.0\n", "12 -999.0\n", "13 9.0\n", "14 10.0\n", "15 -999.0" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Imputación con un valor constante\n", "\n", "Ejemplo_4=Ejemplo.copy()\n", "Ejemplo_4['A'].fillna(-999, inplace=True)\n", "Ejemplo_4" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
A
01.0
12.0
43.0
54.0
75.0
86.0
107.0
118.0
139.0
1410.0
\n", "
" ], "text/plain": [ " A\n", "0 1.0\n", "1 2.0\n", "4 3.0\n", "5 4.0\n", "7 5.0\n", "8 6.0\n", "10 7.0\n", "11 8.0\n", "13 9.0\n", "14 10.0" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Eliminación de las filas con valores faltantes\n", "\n", "Ejemplo_5=Ejemplo.copy()\n", "Ejemplo_5.dropna(subset=['A'], inplace=True)\n", "Ejemplo_5" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Corrección de tipos de datos\n", "\n", "El primer paso en el proceso de limpieza de datos es corregir los tipos de datos. A menudo, los datos se almacenan en formatos que no son adecuados para el análisis. Por ejemplo, una columna que debería ser numérica se almacena como una cadena. En esta sección, aprenderemos a corregir los tipos de datos de un DataFrame.\n", "\n", "```{note}\n", "Para corregir los tipos de datos de un DataFrame, utilizaremos el método `astype()` de Pandas. Este método nos permite convertir el tipo de datos de una columna a un tipo de datos específico. Por ejemplo, si queremos convertir una columna a un tipo de datos numérico, podemos usar el siguiente código:\n", "\n", "```python\n", "df['column_name'] = df['column_name'].astype('float')\n", "```" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "month\n", "may 13766\n", "jul 6895\n", "aug 6247\n", "jun 5341\n", "nov 3970\n", "apr 2932\n", "feb 2649\n", "jan 1403\n", "oct 738\n", "sep 579\n", "mar 477\n", "dec 214\n", "Name: count, dtype: int64" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Variable day_of_month\n", "\n", "data['day_of_month'].value_counts(dropna=False)\n", "data['month'].value_counts(dropna=False)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 5\n", "1 5\n", "2 5\n", "3 5\n", "4 5\n", " ..\n", "45206 17\n", "45207 17\n", "45208 17\n", "45209 17\n", "45210 17\n", "Name: day_of_month, Length: 45211, dtype: object" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "### Hagamos una cadena de texto de la forma dd-mmm\n", "## debemos cambiar el tipo de dato de day_of_month como una cadena de texto (string)\n", "\n", "data['day_of_month']=data['day_of_month'].astype(str)\n", "data['day_of_month']" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 5-may-2023\n", "1 5-may-2023\n", "2 5-may-2023\n", "3 5-may-2023\n", "4 5-may-2023\n", " ... \n", "45206 17-nov-2023\n", "45207 17-nov-2023\n", "45208 17-nov-2023\n", "45209 17-nov-2023\n", "45210 17-nov-2023\n", "Name: day_month, Length: 45211, dtype: object" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Creamos la variable day_month\n", "\n", "data['day_month']=data['day_of_month']+'-'+data['month']+'-2023'\n", "\n", "data['day_month']" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 2023-05-05\n", "1 2023-05-05\n", "2 2023-05-05\n", "3 2023-05-05\n", "4 2023-05-05\n", " ... \n", "45206 2023-11-17\n", "45207 2023-11-17\n", "45208 2023-11-17\n", "45209 2023-11-17\n", "45210 2023-11-17\n", "Name: day_month, Length: 45211, dtype: datetime64[ns]" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## La convertimos a tipo de dato fecha\n", "\n", "data['day_month']=pd.to_datetime(data['day_month'], format='%d-%b-%Y')\n", "data['day_month']" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 298\n", "1 298\n", "2 298\n", "3 298\n", "4 298\n", " ... \n", "45206 102\n", "45207 102\n", "45208 102\n", "45209 102\n", "45210 102\n", "Name: days_to_today, Length: 45211, dtype: int64" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "### Cambiar el tipo de dato trae sus ventajas\n", "from datetime import datetime\n", "\n", "today=datetime.now()\n", "\n", "data['days_to_today']=(today-data['day_month']).dt.days\n", "data['days_to_today']" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.5" } }, "nbformat": 4, "nbformat_minor": 4 }